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Abstract. The analysis of spectral signals for features that represent physical phenomenon 
is ubiquitous in the science and engineering communities. There are two main approaches 
that can be taken to extract relevant features from these high-dimensional data streams. 
The first set of approaches relies on extracting features using a physics-based paradigm 
where the underlying physical mechanism that generates the spectra is used to infer the 
most important features in the data stream. We focus on a complementary methodology 
that uses a data-driven technique that is informed by the underlying physics but also has 
the ability to adapt to unmodeled system attributes and dynamics. We discuss the fol- 
lowing four algorithms: Spectral Decomposition Algorithm (SDA), Non-Negative Matrix 
Factorization (NMF), Independent Component Analysis (ICA) and Principal Components 
Analysis (PCA) and compare their performance on a spectral emulator which we use to 
generate artificial data with known statistical properties. This spectral emulator mimics 
the real-world phenomena arising from the plume of the space shuttle main engine and can 
be used to validate the results that arise from various spectral decomposition algorithms 
and is very useful for situations where real-world systems have very low probabilities of fault 
or failure. Our results indicate that methods like SDA and NMF provide a straightforward 
way of incorporating prior physical knowledge while NMF with a tuning mechanism can give 
superior performance on some tests. We demonstrate these algorithms to detect potential 
system-health issues on data from a spectral emulator with tunable health parameters. 


1. Introduction 

The analysis of spectral signals is one of the classic problems in physics. Numerous 
references, dating back to the 17th century have discussed optical spectra, and then with the 
deeper understanding of quantum mechanics, the relationship between chemical elements 
and spectral energy. For the purposes of this paper, we model the observed spectral data as 
a time series of spectra Y(A, N). The columns in this matrix correspond to the observations 
of the spectra at a given time N. The rows correspond to the wavelengths at which the 
spectral observations are made. The spectral components at a given time are a vector of 
observations of length A where A depends on the resolution of the spectral data acquired by 
the detector. In our current application, A is typically 1061. 

The problem that we address in this paper is to develop and test approaches to extract 
relevant system-health information from Y(A,N). We advance this by studying various 
matrix factorization techniques which result in signals that are of lower dimension and that 
can contain relevant health information. The standard approach to solve this problem is to 
use Principal Components Analysis (PCA) which results in a factorization of the matrix into 
a set of m orthogonal basis vectors where m < C A and m is chosen from the eigenspectrum 
of Y. As we will see in our analysis, these results can be very useful in understanding 
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the underlying data generating process. However, the PCA algorithm suffers from some 
shortcomings that require us to make further algorithmic advances. These deficiencies will 
be discussed later in this manuscript. 



Figure 1. This ’waterfall’ figure shows a typical spectral signal generated by 
our Spectral Emulator, which is a software program that generates spectral 
time series with known statistical properties. The output of the Spectral Em- 
ulator has some properties that are similar to emissions spectra from liquid 
propulsion systems. The figure shows significant structure in the lower wave- 
length bands and has been seeded with known wavelength locations for 10 
elements. Notice that the spectral signatures are time varying in nature. 

Figure 1 shows a few columns in a Y matrix that is generated from a Spectral Emulator 
that is discussed later in this paper. The key properties of spectral signals is that they 
exhibit variations in multiple wavelength bands (left-hand axis) that can be correlated due 
to the underlying data generating process. The large signature on the left-hand side of the 
figure is often indicative of a process that covers a large band of wavelengths. Although from 
a statistical perspective Y can be modeled as a multivariate time series, it is helpful to note 
that the spectral properties are highly correlated with one another. 

The application of algorithms for spectral decomposition to systems health manage- 
ment issues arises as follows. Most data-driven anomaly detection algorithms are not able to 
directly operate on high dimensional data sets because of the so-called curse of dimensional- 
ity [7]. The spectral decomposition methods we discuss here result in significant dimension- 
ality reduction while preserving a significant amount of systems health related information 
as measured by the performance of the detection algorithm. This paper shows, however, that 
standard dimensionality reduction techniques, such as PCA must be applied judiciously in 
situations where the amount of data is significant or when one has a priori knowledge. 
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2. Spectral Emulator 

The emulator is designed to generate a set of time series of spectra similar to what 
is measured with a spectrometer in optical plume analysis in liquid propulsion engines. It 
does not employ a physics based model for data generation, but instead the intent is to 
emulate similar signals that can be found in optical plume data with the assumption that 
the basis vectors are linearly mixed. The resulting data is a linear combination between a 
set of spectral basis vectors S and the corresponding temporal profiles a, with a noise factor 
77 . The linear combination is as follows: Y = Sa + 77 . The spectral basis vectors consist of 
three distinct components 

1. ) The estimated broadband spectral profile of a hydroxide burn. 

2. ) The spectral profile due to reflective particle scattering. 

3. ) Ten unique severity one elemental wavelengths profiles. 

The hydroxide (OH) component represents the spectral features produced from the 
burning of pure hydrogen and oxygen during engine operation. It makes up the majority of 
the energy in the signal and has a broadband spectral profile. The emulator recreates this 
spectral feature by building a higher order polynomial function with coefficients that are 
allowed to vary from run to run in such a way as to have a similar patter as the OH burn 
found in the real data. The corresponding time profile for the OH component can either be 
generated as a linear slope or exponential profile with intermittent amplitude changes or any 
combination. 

The background scatter profile attempts to represent the phenomena that occurs when 
particles produce radiation which is reflected amongst the rest of the particles in the plume. 
This creates a background noise that has a periodic characteristic in the spectral domain [1, 
2], The emulator recreates this spectral feature by using a weighted combination of sines 
and cosines to produce the desired effect. The corresponding background scatter time profile 
is a positive random distribution over time. 

Table 1. The table shows the prominent spectra lines for SSME elements in 
the spectral range of 320 to 426 nm [17]. 


Elements 

Wave lengths 

Nickel 

341.5, 345.9, 346.2, 349.3, 351.6, 352 . 6 , 362.0 

Iron 

372.0, 373.7, 374.6, 375.0, 382.1, 382.6, 385.7, 386 . 1 , 388 . 7 , 388.7 

Chromium 

357.9, 359.4, 360.6, 425.6 

Cobalt 

341.3, 345.0, 345.5, 346.6, 347.5, 350.3, 351.5, 353.1, 357.6, 387.4 

Copper 

324 . 8 , 327.4 

Manganese 

403.4 

Calcium 

422.6 

Aluminium 

396.1 

Silver 

328.0, 338.3 

Magnesium 

370 . 2 , 371 . 9 , 380 . 8 , 383 . 3 , 384.5 


The elemental profiles each have a set of primary and secondary wavelengths that corre- 
spond to known severity one list elements found during engine operation (Ni, Fe, Cr, Co, Cu, 
Mn, Ca, Al, Ag, and Mg) [ 6 ]. The spectral profile is recreated by generating a high signal 
to noise ratio at the peaks of the primary and secondary wavelengths for a given element 
and assigning positive uniform noise to the remaining wavelengths in the spectrum. The 
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corresponding time profiles for the elements have intermittent spikes and have a baseline 
close to zero. This behavior has been documented in previous publication [1, 2, 5, 4] and 
also observed in the real data. 

The element spectral basis vectors are individually linearly combined with appropriate 
time profiles to form a time series of spectra Y for each element. Each Y is then normalized 
to have unit energy. The OH and background scatter spectral basis vectors are combined and 
normalized with their corresponding time profiles in the same manner. The matrix elements 
for all Y’s are then combined as follows: 



Yi represents the time series of spectra matrix for each component including the OH, the 
background scatter, and each of the ten element components, ou corresponds to the energy 
weight for each component. The element components are each given weights on the order of 
1CT 3 , the background scatter at ICO 2 and the OH component contains the remaining energy. 

After combining all contributions and their appropriate energy weighting from the in- 
dividual Y matrices the energy of Y Fina i has unit energy. The resulting Y Fina i matrix has 
dimensions A x N where A is the number of wavelengths in the spectral domain and N is 
the number of time samples. At any given time sample the spectra contains a mixing of all 
components. The decomposition techniques addressed in this paper attempt to extract these 
basis vectors and isolate the element components. Unlike test stand data the emulated data 
contains known ground truth for all samples in time that correspond to the element burns 
and therefore we can compute detection rates, which are reported in the results section. 

3. Decomposition Algorithms 

The main idea behind the decomposition algorithms discussed here is to reduce the 
number of dimensions in the observed signal to extract features that can be used for anomaly 
detection. Ideally the features would be indicative of the health of the system under study. 
For our examples, we assume that we are studying data from a liquid propulsion system 
such as the space shuttle main engine. These extracted signals should correspond to known 
chemical species in the propulsion system. Higher concentrations of certain metals, such 
as Cr, Ni, and Fe can be indicative of adverse conditions in the engine [17]. Thus, these 
algorithms must generate interpretable signal decompositions so that users can have a clear 
understanding of the underlying physical mechanism. This ’interpretability’ requirement 
is not necessarily achieved by standard statistical algorithms. This paper overviews some 
key innovations in the statistical machine learning community that can be useful for this 
application domain 1 . 

3.1. Spectral Decomposition Algorithm. The approach that we take to decompose the 
spectral time series Y(A,N) utilizes methods in the blind source separation literature [11], 
In so-called blind source separation problems, we assume that a set of stationary signals S 
is mixed through a linear mixing matrix a. The result of this mixing matrix is the observed 
signal Y [16]: 

1 More detailed information on these algorithms and Opad application is available at Dashlink website 

(http: / / dashlink.arc.nasa.gov/) . 
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m 


(2) 

y (t) = Siat + T) t 

i=\ 

(3) 

Y = Sa + rj 


In this formulation, y (t) is a column vector of size A x 1, Si is a vector of A x 1 and a t 
is a vector of size m x 1 and we assume that there are a total of m stationary components 
Si. In this formulation, we need to solve for S and a given the Y vectors. The procedure to 
do this decomposition is given in [16] and is called the Spectral Decomposition Algorithm 
(SDA). SDA works by assuming a random starting point for S, computing a and then 
recomputing a given the current estimate of S. The update equations are based on a least 
squares solution to the problem and provides fast convergence to a solution and are given 
in the cited reference. The cost function that is minimized in SDA is \Y — Sa|| J which is 
solved using an unconstrained optimization procedure. 

This algorithm features an easy way to incorporate prior knowledge. For example, 
suppose that one knows the spectral emission lines for m i elements under study. In this 
case, these m, spectral lines can be encoded into the initial guess of the matrix S. The first 
estimate of a then will result in the optimal (in the least squares sense) result given the initial 
guess. Subsequent guesses will lead to further refinements in the initial estimates of S with 
additional signals being estimated if m > rrii. With a random initialization, this algorithm 
converges to a set of orthogonal stationary signals. While there are many solutions to the 
linear model show in Equation 3, SDA is particularly fast and flexible in its formulation, 
thus providing an ideal model for decomposing spectra. 

A key weakness of the algorithm, however, is that it assumes that the mixing between S 
and a is linear. In many real-world cases, it may be the case that a nonlinear mixing occurs. 
Depending on the nature of the nonlinearity, SDA may not correctly capture the appropriate 
components. A nonlinear form for this mixing can be expressed as Y = < F(S', a) + ?/. We 
have solved the problem for $ being a linear operator. 

3.2. Principal Components and Factor Analysis. The solution can be shown to be a 
variant of the famous PCA algorithm [10] developed originally by Hotelling in the 1930’s. The 
principal components algorithm computes an orthogonal decomposition of the correlation 
matrix produced by the matrix Y . In this computation we define the correlation matrix 
E = (Y — j Y) t (Y — jF), where Y is the mean of the data in the columns of Y and j is a 
vector of size N x 1 where we assume that we have N spectral samples. A diagonalization 
of this matrix yields the principal components: 

(4) E = V t /YV 

where first m columns of the matrix V correspond to the largest eigenvalues in the diago- 
nal matrix A. These eigenvectors can be easily shown to span the directions of maximum 
variance in the data matrix Y. This specihc and unique property of PCA makes the station- 
ary signals easy to interpret from a mathematical perspective. However, these components 
may not be easily interpretable from the point of view the data generating process. The 
PCA algorithm cannot be easily initialized with prior knowledge since the extracted signals 
are uniquely determined by Equation [?]. PCA, as in the case of SDA, does not extract 
appropriate signals in cases with nonlinear mixing. 
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Input: Iaxjv, m (desired rank),SA Xm , a mx jv and Q (stopping criteria) 
Step 1: Randomly intialize S, a with positive values. 

Step 2: While(not Q) 

a) Update a:=a. * (S T U)./(S 7 Sa); 

b) Update S:=S. * (Ua T )./(Saa T ); 
end 

Output: S, a 


Figure 2. Steps of Standard NMF Algorithm. 

In the statistical community, there is a variant of PCA known as Factor Analysis, which 
is widely used in the social sciences. In Factor Analysis, the matrix Y is decomposed as 
Y = Sa + Si where the factors S and a are assumed to have zero mean, orthogonal and 
of unit length (orthonormal). There are additional constraints placed on S and a in this 
decomposition. Once a decomposition is performed, it is possible to rotate the resulting 
factors S via a rotation matrix. This allows the analyst to identify features that may be 
more interpretable [10] [14]. 

3.3. Non Negative Matrix Factorization. In SDA and PCA, the matrix decompositions 
allow the elements of S and a to be either positive or negative. However, the spectral data 
that is observed in Y is always non-negative. Recently [9, 8, 12] a new matrix decomposition 
algorithm has been developed called non negative matrix factorization (NMF) which finds 
a decomposition Y = Sa such that all the elements of S and a are non negative. This 
decomposition preserves an important property of the spectral data and can lead in some 
cases to superior results. 

NMF minimizes the squared reconstruction error C = | |U — Sa| | 2 given the constraints 
that S and a contain non negative values. In some variants of the algorithm, it is possible 
to place a sparseness constraint on the solution matrices. This can lead to better and more 
interpretable solutions [9, 8]. Another attractive feature of NMF is that it converges rapidly 
and can be easily interpretablc for some applications. The pseudo code of some variants NMF 
algorithms using various updating rules can be obtained in the following review literature [15]. 
Like SDA, this constrained optimization problem leads to an iterative algorithm to update 
S and a. The standard NMF algorithm is given in Figure 2. 

3.4. Non Negative Matrix Factorization with Energy Minimization. We explored 
two novel variants of NMF that allows us to impose a further constraint on the energy (i.e., 
the sum of the squared values in the components of S and a). These variants can be captured 
through the following two optimization functions: 

(5) C x 

( 6 ) C 2 

For cost function C\, the objective is to minimize the squared reconstruction error 
(Euclidean distance) given the constraints non-negativity constraints. The penalty function 
includes a function section term that represents the energy of the hidden components in the 
spectral domain. The constant ay is user specified and controls the relative weight of this 
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Input: Fax a, m (desired rank),S Axm , a mxN , Q (stopping 
criteria), aq and 0:2 (regularization parameter) 

Step 1: Intialize S,a. 

Step 2: While(not Q) 

a) Update a:= ^ ^ 


b) a = a. 


(S T S + a 2 I) ’ 
a >= 0); 

(a T F) . 


c) Update S:=- 

(aa J + aq/) 

d) S = S. *(S >= 0); 


end 

Output: S, a 


Figure 3. Pseudo code for NMF Algorithms with alternating least squares 
update using both reglarization parameters as shown in Equation 6. 


constraint in the overall optimization problem. The presence of this second term creates a 
tradeoff between the smoothness of the S components with the reconstruction error. The 
optimization algorithm will tend to minimize first term (Euclidean distance) of C\ while 
minimizing the 2nd term for a given aq. Therefore, while running the optimization at each 
iteration, the optimization algorithm tends to scale down S while scaling up a in ruturn, so 
that the product of these two terms always stays the same. In the existing literature, the 
way this problem is handled is by rescaling S and a after each iteration [9]. This means, 
after every iteration, each column of a is normalized to unit length followed by an update of 
S. 

Cost function C 2 has similar properties to Cj with the added constraint that the energy 
of a is to be minimized as well. This results in enforcing a smoothness constraint on both 
S and a while potentially increasing the reconstruction error. For this case, since both 
terms are included in the optimization function, no rescaling is necessary. The pseudo code 
of NMF that correspond to C' 2 cost functions is given in Figure 3. Here the update rule 
is obtained from the least square solution of the derivative of the objective function with 
respect to S or a. This method is known as “NMF with least square update” and further 
details of this approach can be obtained in the some of the very recent literatures [3, 13]. 
In the current scope of this study, we intend to demonstrate the applicability of the above 
mentioned blind source algorithms to analyze high dimensional spectral time series Y(t, A) 
in order to detect the presence of the element burns in the wavelength-time plane as an 
indicative of the degradation of the systems heath. In this context it is worth to mention 
here that the choice of the second variants of NMF (as given in Figure 3) is very instinctive 
as we are searching for the sparse failure profiles corresponding to each element from the 
severity list. 


3.5. Independent Components Analysis. Independent Components Analysis (ICA) [7] 
has received wide-spread attention as a new method of signal decomposition. It assumes 
that the signal matrix Y is a superposition of components that are statistically independent 
and non-Gaussian. 
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Table 2. This table shows the Area under an ROC curve for SDA, PCA, 
and NMF for three different detection thresholds. Here o is the standard 
deviation of the reconstructed Y matrix for each algorithm and an & a 2 are the 
regularization parameters. Note that the SDA algorithm and PCA algorithms 
have very similar performance. 


Algorithms 

Area Under ROC 

Time 

Complexity 

(Seconds) 

Threshold 3 -a 
mean std 

Threshold 5 -o 
mean std 

Threshold 10-cr 
mean std 

SDA 

0.59 

0.006 

0.80 

0.006 

0.93 

0.009 

11 

PCA 

0.59 

0.000 

0.80 

0.000 

0.94 

0.000 

11 

1CA 

0.59 

0.004 

0.76 

0.005 

0.75 

0.012 

25 

NMF (ai,a2) 

0.62 

0.022 

0.81 

0.038 

0.86 

0.106 

1867 

NMF (prior) 

0.59 

0.016 

0.78 

0.008 

0.83 

0.031 

3577 

SDA (prior) 

0.59 

0.000 

0.80 

0.000 

0.93 

0.000 

7 




Time axis (sample points) 


FIGURE 4. The figure shows the true failure profiles corresponding to the OH 
component and the element burns, plotted across time. 


4. Results 

Table 2 represents a comparative study on the peformance of all the four different 
algorithms on a spectral data set (T) which has a dimension of 1061 and 1000 instances. 
The true failure profile of the hydroxide (OH) and element burns corresponding to test set 
(y) has been shown in Figure 4. As mentioned earlier, that the outcome of any of these 
decomposition algorithms is a set of basis vectors and their corresponding failure profiles. 
In this study, we have dcliberatly extracted 12 hidden components because we assume that 
the first 12 basis vectors will consist most of the spectrum information regarding the OH 
burn, element burns and background scatter. For analysis purpose, we first reconstructed a 
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data matrix Y true with only the true element burns and their time profiles. This new data 
matrix serves as a ground truth representing the varying energy profile corresponding to the 
element burns over time. Similiar data matrix Y a i go has been constructed using the profiles 
(spectrum and time) of the decomposed element burns extracted from eacho individual 
algorithm. Thereafter a detection threshold has been imposed on Y true and Y a j go to calculate 
the ’’area under the ROC curve’”, a metric that has been used to evaluate the performance 
of the algorithm in this study. For each algorithms, all the readings corresponding to 50 runs 
have been recorded. The numbers shown in Table 2 represent the mean and the standard 
deviation calculated over those 50 runs for 3 different detection threshold. The right most 
column of the table represents the mean time complexity of each algorithm. 

From the above table it can be seen that both SDA and PCA exibit similiar performance 
and emerge as winners. We have also observed that standard NMF was unable to seperate 
the OH burns from the element burns and all the 12 components extracted by standard 
NMF consists of OH profile. This is understandable as standard NMF would always try to 
minimize the reconstruction error and it will try to achieve this by distributing the energy of 
the most dominent feature (in this case the OH burn) over all the 12 extracted components. 
The NMF algorithm with sparsity factors as proposed by Hoyer [8] was unable to provide 
with some meaningfull solution in this case, as the data matrix (Y) is composed of both non- 
sparse OH profile and sparse element profiles in wavelength-time domain. However NMF 
with regularization parameter was able to present a much better performance compared to 
standard NMF with/without sparsity and ICA while detecting the element burns. 

In a seperate study, we have incorporated domain knowledge in some of the algorithms 
like SDA and NMF. This was done by intializing the first 10 components of S with digitalized 
signal (of 1-s and 0-s) having peaks at the primary and secondary wavelengths corresponding 
to all 10 element while the rest 2 were initialized randomly. The result showed no particular 
improvement in the performance of SDA with additional domain knowledge. However there 
was a noticable improvement in the time complexity. While standard NMF (Figure 2) did 
not work in the first place but with the domiain knowledge included, the same algorithm 
proved to be successful! in detecting the element burns effectively. 


5. Conclusions 

This paper has reviewed some of the most recent and popular blind source seperation 
techniques to generate low dimensional signals, which provide the best description of the 
hidden features associated with the system states. In this paper we have discussed the 
use of standard algorithms like SDA, PCA, ICA and NMF to extract hidden features as a 
necessary step towards anomaly detection on high dimensional data sets and finally provided 
a comparative study of the performances of these methods under different detection criteria. 
We have also described a spectral emulator that provides a good approximation of some 
of the events arising from the plume of the space shuttle main engine and this also serves 
as a good source of high dimensional data sets. The above mentioned algorithms have 
demonstrated the ability to detect the presence of element burns and separate them from 
OH profiles. Furthermore the detection method applied was based on a fixed threshold, 
which leaves room for improvement in future work where more advanced machine learning 
algorithms that are not simply amplitude based can do much better at detecting similar 
types of anomalies. 
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(a) Spectrum profile 
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(b) Time profile 

Figure 5. The figure shows the outcome of the SDA algorithm without any 
prior knowledge. Figure 5 (a) represents the extracted bisis vectors while Fig- 






















(b) Time profile of individual basis vector 

Figure 6. The figure shows the outcome of the NMF algorithm with regular- 
ization terms as expressed in Equation 6. In this analysis, the regularization 
























